2016-06-26

Your turn

  • What is a (data) plot?
  • What are the three most important data plots?

Your turn

How would you describe this plot?

What about this one?

Using the package ggplot2

Elements of a plot

  • data
  • aesthetics: mapping of variables to graphical elements
  • geom: type of plot structure to use
  • transformations: log scale, …

Additional components

  • layers: multiple geoms, multiple data sets, annotation
  • facets: show subsets in different plots
  • themes: modifying style

Why use a grammar of graphics?

Variable in the data is directly mapped to an element in the plot

Data - Autism

glimpse(autism)
# Observations: 604
# Variables: 7
# $ childid  (int) 1, 1, 1, 1, 1, 10, 10, 10, 10, 100, 100, 100, 100, 10...
# $ sicdegp  (fctr) high, high, high, high, high, low, low, low, low, hi...
# $ age2     (dbl) 0, 1, 3, 7, 11, 0, 1, 7, 11, 0, 1, 3, 7, 0, 1, 7, 11,...
# $ vsae     (int) 6, 7, 18, 25, 27, 9, 11, 18, 39, 15, 24, 37, 135, 8, ...
# $ gender   (fctr) male, male, male, male, male, male, male, male, male...
# $ race     (fctr) white, white, white, white, white, white, white, whi...
# $ bestest2 (fctr) pdd, pdd, pdd, pdd, pdd, autism, autism, autism, aut...

Plotting points

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_point()

Your turn

How is the data mapped to graphical elements?

  • data: _______
  • aesthetics: _________
  • geom: ________
  • transformations: _________

Jittering points

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_jitter()

Your turn

How is the data mapped to graphical elements?

  • data: _______
  • aesthetics: _________
  • geom: ________
  • transformations: _________

Adding lines

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_point() + geom_line()

Not the lines we want

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_point() + geom_line()

Too much ink

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_point() + geom_line(alpha=0.5)

Log scale y?

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_point() + geom_line(alpha=0.5) + scale_y_log10()

By age 2 diagnosis

ggplot(autism, aes(x=age2, y=vsae, group=childid, colour=bestest2)) + 
  geom_point() + geom_line(alpha=0.5) + scale_y_log10()

Refine groups

ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + 
  geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) + 
  geom_smooth(se=F) +
  scale_y_log10()

Your turn

What do we learn about autism, age, and the diagnosis at age 2?

Your turn

How is the data mapped to graphical elements?

  • data: _______
  • aesthetics: _________
  • geom: ________
  • transformations: _________

A different look

That's not what I wanted ….

For each age measured

Which is better?

New example - Flying etiquette

41% Of Fliers Think You’re Rude If You Recline Your Seat

# Observations: 1,040
# Variables: 27
# $ RespondentID                                                                                                                             (dbl) ...
# $ How often do you travel by plane?                                                                                                        (chr) ...
# $ Do you ever recline your seat when you fly?                                                                                              (chr) ...
# $ How tall are you?                                                                                                                        (int) ...
# $ Do you have any children under 18?                                                                                                       (chr) ...
# $ In a row of three seats, who should get to use the two arm rests?                                                                        (chr) ...
# $ In a row of two seats, who should get to use the middle arm rest?                                                                        (chr) ...
# $ Who should have control over the window shade?                                                                                           (chr) ...
# $ Is itrude to move to an unsold seat on a plane?                                                                                          (chr) ...
# $ Generally speaking, is it rude to say more than a few words tothe stranger sitting next to you on a plane?                               (chr) ...
# $ On a 6 hour flight from NYC to LA, how many times is it acceptable to get up if you're not in an aisle seat?                             (chr) ...
# $ Under normal circumstances, does a person who reclines their seat during a flight have any obligation to the person sitting behind them? (chr) ...
# $ Is itrude to recline your seat on a plane?                                                                                               (chr) ...
# $ Given the opportunity, would you eliminate the possibility of reclining seats on planes entirely?                                        (chr) ...
# $ Is it rude to ask someone to switch seats with you in order to be closer to friends?                                                     (chr) ...
# $ Is itrude to ask someone to switch seats with you in order to be closer to family?                                                       (chr) ...
# $ Is it rude to wake a passenger up if you are trying to go to the bathroom?                                                               (chr) ...
# $ Is itrude to wake a passenger up if you are trying to walk around?                                                                       (chr) ...
# $ In general, is itrude to bring a baby on a plane?                                                                                        (chr) ...
# $ In general, is it rude to knowingly bring unruly children on a plane?                                                                    (chr) ...
# $ Have you ever used personal electronics during take off or landing in violation of a flight attendant's direction?                       (chr) ...
# $ Have you ever smoked a cigarette in an airplane bathroom when it was against the rules?                                                  (chr) ...
# $ Gender                                                                                                                                   (chr) ...
# $ Age                                                                                                                                      (chr) ...
# $ Household Income                                                                                                                         (chr) ...
# $ Education                                                                                                                                (chr) ...
# $ Location (Census Region)                                                                                                                 (chr) ...

Variables

Mix of categorical and quantiative variables. What mappings are appropriate? Area for counts of categories, side-by-side boxplots for mixed pair.

Support

ggplot(fly, aes(x=`How often do you travel by plane?`)) + 
  geom_bar() + coord_flip()

Categories are not sorted

Sorted categories

fly$`How often do you travel by plane?` <- 
  factor(fly$`How often do you travel by plane?`, levels=c(
    "Never","Once a year or less","Once a month or less",
    "A few times per month","A few times per week","Every day"))
ggplot(fly, aes(x=`How often do you travel by plane?`)) + geom_bar() + coord_flip()

Filter data

fly_sub <- fly %>% filter(`How often do you travel by plane?` %in% 
                            c("Once a year or less","Once a month or less")) %>%
  filter(!is.na(`Do you ever recline your seat when you fly?`)) %>%
  filter(!is.na(Age)) %>% filter(!is.na(Gender))

Recline by height

fly_sub$`Do you ever recline your seat when you fly?` <- factor(
  fly_sub$`Do you ever recline your seat when you fly?`, levels=c(
    "Never","Once in a while","About half the time",
    "Usually","Always"))
ggplot(fly_sub, aes(y=`How tall are you?`, x=`Do you ever recline your seat when you fly?`)) + geom_boxplot() + coord_flip()

Cheat sheet

Your turn

How many geoms are available in ggplot2? What is geom_rug?

Your turn

What is the difference between colour and fill?

Your turn

What does coord_fixed() do? What is the difference between this and using theme(aspect.ratio=...)?

Your turn

What are scales? How many numeric transformation scales are there?

Your turn

What are position adjustments? When would they be used?

Your turn

Use your cheat sheet to work out how to make plot to explore the relationship between

Do you ever recline your seat when you fly? and Is itrude to recline your seat on a plane?

Facets

ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar() + coord_flip() + facet_wrap(~Gender)

Facets

fly_sub$Age <- factor(fly_sub$Age, levels=c("18-29","30-44","45-60","> 60"))
ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar() + coord_flip() + facet_grid(Age~Gender)

Color palettes - default

p <- ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
                    fill=Gender)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)
p

What do we learn?

Color palettes - brewer

p + scale_fill_brewer(palette="Dark2") 

Color blind-proofing

library(scales)
library(dichromat)
clrs <- hue_pal()(3)
p + theme(legend.position = "none")
clrs <- dichromat(hue_pal()(3))
p + scale_fill_manual("", values=clrs) + theme(legend.position = "none")

Perceptual principles

  • Hierarchy of mappings: (first) position along an axis - (last) color (Cleveland, 1984; Heer and Bostock, 2009)
  • Pre-attentive: Some elements are noticed before you even realise it.
  • Color: (pre-attentive) palettes - qualitative, sequential, diverging.
  • Proximity: Place elements for primary comparison close together.
  • Change blindness: When focus is interrupted differences may not be noticed.

Hierarchy of mappings

  1. Position - common scale (BEST)
  2. Position - nonaligned scale
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color (WORST)

Pre-attentive

Can you find the odd one out?

Is it easier now?

Color palettes

  • Qualitative: categorical variables
  • Sequential: low to high numeric values
  • Diverging: negative to positive values

Proximity

ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
                    fill=Gender)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)

With this arrangement we can see proportion of gender within each rudeness category, and compare these across age groups. How could we arrange this differently?

Proximity

ggplot(fly_sub, aes(x=Gender,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) + theme(legend.position="bottom")

What is different about the comparison now?

Another arrangement

ggplot(fly_sub, aes(x=Age,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Gender, ncol=5) + theme(legend.position="bottom")

Themes

The ggthemes package has many different styles for the plots. Other packages such as xkcd, skittles, wes anderson, beyonce, ….

library(xkcd)
ggplot(fly_sub, aes(x=Gender,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
  theme_xkcd() + theme(legend.position="bottom")

Your turn

Compile the rmarkdown document that you have put together thus far in the workshop!

Resources

Share and share alike

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.